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Abstract 

The  Bayes  factor  is  a  useful  summary  for  model  selection.  Calculation  of  this  measure 
involves  evaluating  the  integrated  likelihood  (or  prior  predictive  density),  which  can  be  esti¬ 
mated  from  the  output  of  MCMC  and  other  posterior  simulation  methods  using  the  harmonic 
mean  estimator.  While  this  is  a  simulation-consistent  estimator,  it  can  have  infinite  vari¬ 
ance.  In  this  article  we  describe  a  method  to  stabilize  the  harmonic  mean  estimator.  Under 
this  approach,  the  parameter  space  is  reduced  such  that  the  modified  estimator  involves 
a  harmonic  mean  of  heavier  tailed  densities,  thus  resulting  in  a  finite  variance  estimator. 
We  discuss  general  conditions  under  which  this  reduction  is  applicable  and  illustrate  the 
proposed  method  through  several  examples. 


Keywords:  Bayes  factor,  Beta-binomial,  Integrated  likelihood,  Poisson-Gamma  distribution, 
Statistical  genetics,  Variance  reduction. 
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1  Introduction 


The  integrated  likelihood  is  an  important  quantity  in  model  comparison  —  it  is  the  key 
component  of  the  Bayes  factor  for  example  (Kass  and  Raftery  1995).  Consider  data  y ,  a 
likelihood  function  Tr{y\6)  from  a  model  for  y  indexed  by  a  parameter  9,  in  which  both  y  and 
9  may  be  vector- valued,  and  a  prior  distribution  tt(6).  The  integrated  likelihood  of  y  is  then 
defined  as 

7T (y)  =  J  Tr(y\9)TT(9)  dO. 

The  integrated  likelihood  normalizes  the  product  of  the  likelihood  and  the  prior  in  forming 
the  posterior  density  Tr{9\y).  Furthermore,  as  a  function  of  y  prior  to  data  collection,  7r (y) 
is  the  prior  predictive  density. 

Evaluating  the  integrated  likelihood  can  present  a  difficult  computational  problem.  New¬ 
ton  and  Raftery  (1994)  showed  that  tt  (y)  can  be  expressed  as  an  expectation  with  respect  to 
the  posterior  distribution  of  the  parameter,  thus  motivating  an  estimate  based  on  a  Monte 
Carlo  sample  from  the  posterior.  By  Bayes’s  theorem, 


n{6\y) 
n(y)  J  Tr(y\9) 
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suggesting  that  the  integrated  likelihood  tt  (y)  can  be  approximated  by  the  harmonic  mean 

n  -1 


7Thm(3/) 
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based  on  B  draws  61 ,  92 , . . . ,  6B  from  the  posterior  distribution  tt (9\y).  This  sample  might 
come  out  of  a  standard  Markov  chain  Monte  Carlo  implementation,  for  example.  Though 
^hmC?/)  is  consistent  as  the  simulation  size  B  increases,  its  precision  is  not  guaranteed. 

The  simplicity  of  the  harmonic  mean  estimator  (2)  is  its  main  advantage  over  other  more 
specialized  techniques  (e.g.  Chib  1995,  Raftery  1996,  Lewis  and  Raftery  1997,  DiCiccio  et  al. 
1997).  It  uses  only  within-model  posterior  samples  and  likelihood  evaluations  which  are  often 
available  anyway  as  part  of  posterior  sampling.  A  major  drawback  of  the  harmonic  mean 
estimator  is  its  computational  instability.  The  estimator  is  consistent  but  may  have  infinite 
variance  (measured  by  Var{[7r(y|6l)]_1jy})  across  simulations,  even  in  simple  models.  When 
this  is  the  case,  one  consequence  is  that  when  the  cumulative  estimate  of  the  harmonic  mean 
estimate  (2)  based  on  the  first  B  draws  from  the  posterior  is  plotted  against  B,  the  plot 
has  occasional  very  large  jumps,  and  looks  unstable.  In  this  article  we  present  a  method 
to  stabilize  the  harmonic  mean  estimator.  We  develop  general  conditions  under  which  this 
method  works  and  we  demonstrate  the  method  in  several  examples. 
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2  Stabilizing  the  Harmonic  Mean  Estimator 


An  overly  simple  but  helpful  example  to  illustrate  our  proposed  method  is  the  model  in 
which  9  =  (/i,  0)  records  the  mean  and  precision  of  a  single  normally  distributed  data  point 
y.  A  conjugate  prior  is  0  ~  Gamma  (a/2,  ct/2),  and 


(/•#)  ~  Normal (/x0,n0ip) 


where  a,  n0,  and  /r0  are  hyperparameters  (e.g.,  Bernardo  and  Smith,  1994,  page  268  or 
Appendix  I).  The  integrated  likelihood  tt (y)  is  readily  determined  to  be  the  ordinate  of  a 
t  density,  St(yj/i0,  n0/(n0  +  1 ) ,  ck)  in  the  notation  of  Bernardo  and  Smith  (1994,  page  122 
or  Appendix  I).  Were  we  to  approximate  tt (y)  using  equation  (2),  instead  of  taking  the 
analytically  determined  value,  we  could  measure  the  stability  of  the  estimator  with  the 
variance  Var{[7r(yj#)]_1|y}.  This  variance,  in  turn,  is  determined  by  the  second  noncentral 
moment  E{[ir(y\6)]-'2\y}  which  is  proportional  to 


J  j  ?pa/2  exp  |  ^[(y  -  /i)2  -  n0(n  -  y0)2  -  a]  !>  dtpdn. 


and  which  is  infinite  in  this  example  owing  to  the  divergence  of  the  integral  in  y  for  each 
0.  The  reciprocal  of  the  light-tailed  normal  density  forms  too  large  an  integrand  to  yield  a 
finite  posterior  variance,  and  hence  the  harmonic  mean  estimator  is  unstable. 

An  alternative  estimator,  supported  equally  by  the  basic  equation  (1),  is 

-i 


^SHM(y) 
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E 
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(3) 


which  we  call  a  stabilized  harmonic  mean.  In  (3),  //  is  the  mean  component  of  9l  = 
and  thus  is  a  draw  from  the  marginal  posterior  distribution  7r(/r| y).  The  stabilized  harmonic 
mean  is  formed  not  from  standard  likelihood  values,  but  rather  from  marginal  likelihoods 
obtained  by  integrating  out  the  precision  parameter  0.  It  is  straightforward  to  show  that 
this  marginal  likelihood  has  the  form  of  a  t  ordinate, 


Tr(y\y)  =  St  {y j/i,  ( a  +  l)/[a  +  n0(y  -  /i0)2],  a  +  l}  . 


The  intuition  motivating  (3)  is  that  since  7r(y\fi)  has  a  heavier  tail  than  Ti(y\6),  averages 
of  reciprocal  ordinates  become  averages  of  less  variable  quantities  than  in  (2).  Measuring 
stability  as  above,  we  observe  that 


E{[*(y\n)]  2\y}  oc  J 


{1  +  [{y  -  yf  +  n0(/i  -  ^0)2]/a}a/2+1 
{1  +  Tio(y,  —  Ho)2  /  a}a+1 


dy 


(4) 
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is  finite  when  a  >  1  and  n0  >  0.  This  result  is  proved  in  Appendix  II. 

Figure  1  compares  the  harmonic  mean  7Thm  (y)  to  the  stabilized  harmonic  mean  7TsHM(y) 
for  various  parameter  settings  of  this  simple  normal  example.  For  each  case,  both  estimates 
use  a  common  sample  of  B  =  5,  000  independent  and  identically  distributed  posterior  draws 
for  the  mean  n  and  precision  ip.  Shown  for  each  sample  is  the  value  of  both  estimators  using 
ever  larger  amounts  of  the  sample.  Figure  1  shows  clearly  how  the  infinite  variance  of  the 
harmonic  mean  estimator  manifests  itself  in  practice.  Every  so  often  a  parameter  value  with 
a  very  small  likelihood  is  generated  from  the  posterior,  and  this  yields  a  very  large  value 
of  the  reciprocal  of  the  likelihood,  which  in  turn  greatly  reduces  7Th m{v)-  Subsequently, 
^HM(y)  increases  gradually,  until  another  very  small  likelihood  is  encountered.  Improved 
performance  of  the  stabilized  harmonic  mean  is  evident  in  Figure  1.  The  t-based  estimator 
^SHM(y)  converges  much  more  rapidly  than  the  standard  estimator,  and  does  not  exhibit 
the  same  pattern  of  occasional  massive  changes.  To  further  validate  this  observation,  we 
recomputed  both  final  estimators  on  100  independent  posterior  samples  of  size  B  =  5000 
(Figure  2).  Relative  stability  of  the  7Tshm (y)  is  clearly  indicated. 

The  multivariate  normal  model  is  a  direct  extension  of  the  univariate  normal  example 
discussed  above.  The  standard  estimator,  obtained  using  equation  (2),  is  a  harmonic  mean 
of  multivariate  normal  densities.  This  can  be  easily  shown  to  be  an  unstable  estimator  of 
the  prior  predictive  density.  Integrating  the  precision  parameter  leads  to  a  heavier  tailed 
multivariate  t  density,  which  can  be  used  to  obtain  a  stable  estimator  analogous  to  equation 
(3). 

The  stabilized  harmonic  mean  was  first  reported  in  a  statistical  genetics  application  in 
which  numerical  stability  of  a  t— based  harmonic  mean  was  observed  (Satagopan,  Yandell, 
Newton,  and  Osborn  1996).  §3  presents  a  detailed  study  of  this  case.  Although  the  genet- 
ical  model  used  by  these  authors  is  rather  specialized,  the  method  to  obtain  a  more  stable 
estimate  is  quite  general:  approximate  7i(y)  by  a  harmonic  mean  of  values  irlylh^)]  where 
Q1,  92, . . . ,  6B  form  a  sample  from  the  posterior  distribution  tt(6\ y).  The  function  h(6)  must 
reduce  the  parameter  space  as  much  as  possible,  while  not  making  the  calculation  of  the 
marginal  likelihood  7r[y\h(6)]  too  difficult.  In  the  examples  we  work  out,  h(6)  is  of  lower 
dimension  than  0,  typically  obtained  by  integrating  out  one  or  several  of  the  components. 
Taking  h(6)  to  be  constant  is  an  extreme  case;  7r[y\h(0)]  then  becomes  the  integrated  like¬ 
lihood  7i (y).  Of  course,  if  this  were  computable  there  would  be  no  need  to  calculate  an 
approximation,  and  in  any  case,  the  harmonic  mean  estimator  would  have  zero  variance.  To 
form  harmonic  means  from  reduced  distributions  is  a  general  variance  reduction  technique. 
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Simulations 


Figure  1:  Logarithm  of  normal  (bold  line)  and  t-based  (dotted  line)  harmonic  mean  estimates 
compared  with  the  true  value  (dashed  line),  when  the  data  y  follow  a  univariate  normal 
distribution  as  discussed  in  §1.  The  top  row  of  the  figure  displays  the  harmonic  mean 
estimates  when  y  —  5  and  /i0  =  0.  The  second  row  corresponds  to  y  =  0  and  jU0  =  5.  The 
bottom  row  gives  the  figures  for  y  =  0  and  jU0  =  0.  The  3  columns  columns  correspond  to  a 
values  of  2,  6  and  10.  The  value  of  n0  is  1. 
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log  harmonic  mean 


Figure  2:  Boxplots  to  assess  the  variability  of  the  estimated  integrated  likelihood.  Shown 
are  the  true  integrated  likelihood,  and  the  normal  and  t-based  harmonic  mean  estimators 
on  the  logarithmic  scale.  The  estimates  are  obtained  from  100  Monte  Carlo  samples  of  size 
5000.  These  estimates  are  shown  for  various  configurations  of  parameters  as  in  Figure  1. 
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Theorem  1  If  h  is  a  measurable  function  of  9  then 


Var 


1 

n[y\h(6)] 


<  Var 


Either  variance  may  be  infinite.  If  the  left  hand  side  is  infinite,  then  the  right  hand  side  is 
infinite  also. 


To  avoid  measure-theoretic  considerations,  we  prove  Theorem  1  only  under  the  additional 
condition  that  h(9)  is  a  dimension-reducing  transformation:  i.e.  9  =  (a,/3),  h(9)  =  a, 
and  both  a  and  j3  range  freely  so  that  the  prior  density  7 r(9)  =  7r(a:)7r(/3| a)  is  well-defined. 
See  Appendix  III  for  a  proof.  In  certain  hierarchical  models,  where  analytical  integration  is 
possible  on  one  or  two  levels,  it  may  be  possible  to  identify  useful  reductions  h(9)  to  facilitate 
stable  harmonic  mean  calculations. 

Gelfand  and  Dey  (1994)  noted  an  extension  of  the  basic  identity  (1)  which  justifies 
estimating  the  integrated  likelihood  by  the  harmonic  mean  of  n(y\9t)/K(9t)/f(9t)  where,  as 
before,  the  9t's  are  sampled  from  the  posterior,  but  now  tt(9)  is  the  prior  density  and  f(9) 
is  any  (normalized)  density  on  the  parameter  space.  The  idea  is  to  choose  /  carefully  so 
as  to  minimize  Monte  Carlo  error.  We  show  in  §4  that  our  proposed  stabilization  can  be 
combined  with  this  technique  for  improved  performance.  Indeed  there  is  some  synergy  in 
this  combination  because  the  proposed  stabilization  reduces  the  dimension  of  9,  thus  making 
it  simpler  to  identify  a  useful  /  function. 


3  Statistical  Genetics 

Linear  models  are  used  frequently  in  quantitative  genetics  to  relate  variation  in  a  measured 
trait  (phenotype)  to  variation  in  underlying  genes  affecting  the  trait  (genotype);  Doerge, 
Zeng,  and  Weir  (1997),  for  example,  is  a  useful  review  from  a  statistics  perspective.  We 
reconsider  the  particular  model 

5 

Vi  =  h  +  J2aj9i,j+eii  i=1  (5) 

i=1 

used  by  Satagopan  et  al.  (1996)  to  infer  the  genetic  causes  of  variation  in  the  time-to- 
flowering  phenotype  in  the  plant  species  Brassica  napus.  In  (5),  the  i  indicates  different 
plants  in  a  sample  of  size  n  =  105,  the  phenotypes  y  =  (yf)  are  the  logarithms  of  the 
times  to  flowering,  and  the  decomposition  on  the  right-hand-side  characterizes  the  expected 
phenotype  conditional  on  the  genotype  g,  =  (gij)  at  a  set  of  s  different  genetic  loci.  Here  e* 
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is  modeled  as  a  mean  zero  normally  distributed  disturbance  with  variance  a2  independent 
of  genetic  factors,  y  is  the  marginal  expected  phenotype  and  a3  is  the  genetic  effect  of  the 
jth  quantitative  trait  locus  (QTL).  From  the  particular  experimental  design,  each  genotype 
gij  takes  one  of  two  possible  values,  coded  as  {  —  1, 1},  with  equal  marginal  probability. 

The  model  (5)  would  be  rather  standard  except  that  the  genotypes  g  =  ( g* )  are  unob¬ 
served;  in  fact,  for  each  i  they  represent  the  values  of  a  random  process  defined  over  the  whole 
genome  and  evaluated  at  s  distinct  positions  A  =  (Ai, . . . ,  As),  the  s  putative  QTLs.  The 
number  of  QTLs,  s.  is  unknown,  as  are  their  positions  A  and  their  effects  a  =  (aq, . . . ,  as). 
Indirect  information  about  the  QTL  genotypes  comes  through  genotype  data  m  =  ( rrii )  ob¬ 
tained,  in  this  example,  from  a  panel  of  10  molecular  markers  in  the  chromosomal  region  of 
interest.  The  statistical  problem  is  to  infer  the  unknown  parameters  9  =  (y,  a,  A,  a2)  from 
marker  and  phenotype  data  ( m,y ),  and  considering  missing  genotypes  g. 

Satagopan  et  al.  (1996)  presented  a  Bayesian  solution  in  which  Markov  chain  Monte  Carlo 
(MCMC)  was  used  to  sample  the  posterior  distribution  of  all  the  unknowns  conditional  on 
s,  the  number  of  QTLs,  separately  for  a  range  of  values  of  s.  To  infer  s,  the  integrated 
likelihood  7r(y\m,s)  was  approximated  for  each  s  via  a  harmonic  mean,  and  this  enabled 
calculation  of  Bayes  factors 


BF(si,  s2)  =  7r(y\m,  si)/7r(j/|m,  s2).  (6) 

We  reconsider  this  calculation  in  further  detail.  (Note  that  we  can  condition  on  marker  in¬ 
formation  m  because  its  marginal  distribution  7 r(m)  is  not  dependent  on  any  of  the  unknown 
parameters.) 

The  prior  for  9  factorizes  into  a  uniform  prior  over  ordered  loci  A  =  (Ai, . . . ,  As)  within 
the  chromosomal  region  under  consideration  and  a  conjugate  prior  for  y,  a  =  (cej),  and  a2: 

n(y\a2)  =  Normal(//0,  cr2/n0), 

7r(aj\a2)  =  Norma^oioj,  a2/n0j),  j  = 

7T (cr2)  =  Inverse  Gamma(C/2,  C/2), 

where  y0  =  5,  n0  =  1,  =  5,  n0j  =  1,  for  each  j  and  C  =  8.  Fixing  the  number  of  loci 

s,  one  complete  scan  of  the  MCMC  sampler  updates  each  element  of  9  and  all  the  missing 
genotypes  in  g.  See  Satagopan  et  al.  (1996)  for  further  details  on  the  component  updates.  A 
total  of  3  chains,  corresponding  to  s  =  1,  2,  and3,  were  obtained.  For  a  fixed  s  (=  1,  2,  or  3), 
we  report  results  below  based  on  a  chain  of  length  4000  x  100  complete  scans,  subsampled 
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every  100  scans,  with  the  first  100  saved  states  removed  as  burn-in.  This  corresponds  to  an 
effective  independent  sample  size  of  about  3900  for  estimating  the  genetic  effect  parameters. 

Unknowns  ( 9 *,  gl)  are  sampled  from  their  posterior  distribution  conditional  on  observed 
phenotypes  y ,  marker  genotypes  m  and  the  model  dimension  parameter  s.  Invoking  the 
standard  harmonic  mean  argument,  as  in  (2),  we  approximate  ir(y\m,  s )  by 


KmA{y\m,s) 


]  B  j 

-Y _ - _ 

B  n(y\m16t1gt1s) 


As  in  the  simple  normal  example  of  §2,  a  problem  arises  with  (7)  because  we  are  averaging- 
reciprocals  of  normal  ordinates.  To  stabilize  the  estimator,  we  integrate  out  the  variance 
parameter  a2: 


nSKM(y\m,s) 


'1  B  1 

.B  S  n(y\m,  h^g*,  s) 


(8) 


where  h()  returns  all  components  of  9  except  the  variance  parameter.  In  (8), 

7r(y|m,  h(9t)1  gt1  s )  is  a  scaled  t,  density,  Stn(y\/j  +  ol g,  /,  £). 

Figure  3  shows  the  cumulative  Bayes  factor  estimates  obtained  from  three  chains,  (s  = 
1,2,  and  3),  based  on  integrated  likelihood  estimates  in  either  (7)  and  (8).  Evidently  the 
stabilization  has  worked  in  this  more  complicated  example;  there  are  fewer  massive  changes 
in  the  estimate.  Numerically,  we  obtain  BF(  1,2)  =  0.368  using  (7),  and  BF(  1,2)  =  0.395 
using  the  stabilized  estimator  (8).  The  estimates  of  BF( 2,3)  are  rather  more  disparate: 
13.15  and  and  4.39,  respectively.  In  any  case  we  would  conclude  that  the  two  locus  model  is 


most  likely  a  posteriori 


Figure  4  indicates  the  Monte  Carlo  sampling  variability  of  the  two  estimators.  The  above 
computations  were  done  on  75  repeated  runs.  To  reduce  the  computational  burden  of  the 
simulation,  we  used  a  value  of  B  just  half  of  the  earlier  value.  The  side-by-side  boxplots 


further  confirm  the  success  of  the  stabilization  in  the  present  example. 

We  note  that  other  dimension-reducing  transformations  h(-)  could  be  used  in  this  exam¬ 
ple.  For  example,  we  could  sum  out  the  genotype  values  g  and  thus  average  reciprocals  of 
finite  mixtures  of  normals  (or  t’s).  It  may  also  be  possible  to  integrate  the  genetic  effects  a. 
Neither  of  these  has  been  attempted  here. 


4  Beta— Binomial 

A  naturally  occurring  hierarchical  model  has  observable  counts  y  =  (y*),  i  =  1, . . . ,  m,  arising 
as  conditionally  independent  binomial  random  variables  with  numbers  of  trials  (n*)  and 
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Simulations 


Simulations 


Figure  3:  Bayes  factors  for  the  flowering  time  data  discussed  in  §2.  The  comparison  between 
one  locus  and  two  loci  models  is  shown  on  the  top.  The  bottom  figure  corresponds  to  the 
comparison  between  the  two  and  three  loci  models.  The  bold  line  is  the  standard  Bayes 
factor  estimate.  The  dotted  line  is  the  t-based  estimate. 
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-2  0  2  4  6 

log  Bayes  Factor  Estimate 


-2  0  2 
log  Bayes  Factor  Estimate 


Figure  4:  Assessing  the  variability  of  the  Bayes  factor  estimates  for  the  flowering  time  data 
using  75  repeated  chains.  Comparison  between  the  one  and  two  loci  models  are  shown 
on  the  top,  and  comparison  between  two  and  three  loci  models  are  shown  below.  In  each 
figure,  variability  among  t-based  estimates  is  shown  in  the  top  and  that  among  the  standard 
estimates  is  shown  in  the  bottom. 
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success  probabilities  p  =  ( Pi ).  In  turn,  these  (p-t)  are  modeled  as  conditionally  independent 
beta  variables  with  canonical  hyperparameters  a  and  b  say,  upon  which  some  further  prior 
distribution  7 r(o,  b )  is  placed.  To  obtain  the  probability  of  y  in  this  model,  we  must  integrate 
out  both  ( pi )  and  the  hyperparameters  a  and  b.  It  is  routine  to  sample  the  full  parameter 
set  9  =  ( p,a,b )  from  its  posterior  distribution  (Gelman,  Carlin,  Stern,  and  Rubin  1996). 
For  example,  an  MCMC  simulation  might  update  each  p*  from  its  Beta  full-conditional 
distribution,  and  then  resort,  perhaps,  to  a  random- walk  proposal  to  update  a  and  b. 

The  basic  harmonic  mean  combines  reciprocals  of  binomial  likelihoods  from  the  posterior 
sample,  and,  it  turns  out,  can  be  quite  unstable.  As  before,  stability  is  determined  by  the 
second  noncentral  moment 


E{[K(y\0)]  2I y)  °c  J  -  p)b~1~ni+yi  dp^ia,  b)  dadb. 


Unless  we  take  an  extreme  prior  it  (a,  b )  which  ensures  a  >  max(pj)  and  b  >  ma  x(nj  —  y*),  this 
integral  can  diverge.  The  resulting  prior  is  unrealistically  peaked,  which  is  unsatisfactory, 
ruling  out  the  standard  (unstabilized)  harmonic  mean  estimator  as  a  practical  tool  for  the 
beta-binomial  model. 

It  is  straightforward  to  stabilize  the  harmonic  mean  by  reducing  the  dimension  of  9  as 
in  previous  examples.  One  possibility  is  to  take  h(9)  =  ( a,b );  i.e.  to  integrate  out  all  the 
binomial  success  probabilities.  In  this  conjugate  structure,  we  have  a  closed  form  Beta- 
binomial  expression  for  7i{y \h(9)},  namely 


K{y\h{9)}  =  n 


T(fii  +  1) 


r(o  +  b)  r(a  +  yi)r(b  +  rii-yi) 


r(n*  -  yi  +  l)r(y*  +  1)  r(a  +  b  +  rii)  T(a) 


m 


(9) 


The  harmonic  mean  of  these  beta-binomial  probabilities,  calculated  from  the  (a,  6)’s  sampled 
from  their  posterior,  is  consistent  for  the  integrated  likelihood.  We  may  expect  this  to  be 
more  stable  since  the  beta-binomial  distribution  is  more  diffuse  than  the  binomial,  and  so 
the  reciprocals  of  the  probabilities  may  not  be  as  extreme.  The  stability  of  this  estimator  is 
determined  by  the  second  noncentral  moment,  which  satisfies 


U{[7r(y|a,  6)]  2|y}  <  J  (a  +  b  +  nmax  -  l)m7r(a,  b)  da  db, 

where  nm£tx  =  maxn*.  Stability  is  ensured  when  prior  moments  of  a  and  b  exist. 

Data  on  free-throw  percentages  from  the  National  Basketball  Association  (NBA)  provide 
an  interesting  demonstration  of  the  harmonic  mean  calculations.  On  March  9,  1999,  there 
were  414  active  NBA  players  of  whom  374  had  attempted  at  least  one  free  throw  by  that 
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point  in  the  season.  Among  these  374  players,  the  numbers  of  attempts  (n*)  ranged  from  1 
to  205,  with  a  mean  of  about  35.  We  model  y-t.  the  number  of  made  free  throws  by  player 
i.  to  be  Binomial  with  n*  trials  and  unknown  success  probability  p,.  The  average  free  throw 
percentage  yi/rii  is  about  70%  in  the  data  reported  at  www.yahoo.com  (and  available  from 
the  authors). 

We  consider  the  problem  of  evaluating  the  integrated  likelihood  tt (y)  under  the  hierar¬ 
chical  beta  binomial  model  given  above.  This  would  be  useful  when  comparing  this  model 
with  other  hypothesized  models  for  these  data.  We  place  independent  standard  exponential 
priors  on  a  —  e  and  b  —  e  where  e  =  1  is  a  lower  truncation  point  of  the  prior.  MCMC 
was  used  to  simulate  the  posterior.  The  following  numerical  results  are  based  on  a  single 
chain  of  length  50000  x  50  complete  scans,  subsampled  every  50  scans,  and  with  the  first 
100  saved  states  removed  as  burn-in.  Significant  trends  were  not  detected  in  the  output 
and  time-series  diagnostics  indicated  that  little  dependence  remained  in  the  saved  states. 
Computations  were  done  separately  on  a  second  run  and  we  saw  no  appreciable  differences 
(data  not  shown). 

Natural  logarithms  of  the  product  binomial  likelihood  and  the  product  beta-binomial 
likelihood  (9)  are  monitored  in  Figures  5a  and  5b.  From  these  values  we  obtain  the  standard 
harmonic  mean  estimate  and  the  stabilized  one.  The  log  estimates  are  -820.8  and  -944.2 
respectively;  these  are  quite  different.  The  standard  estimate  is  known  to  be  unstable. 
Indeed  the  variance  of  the  sampled  loglikelihood  values  is  143.8  while  that  of  the  sampled 
log  beta-binomial  values  is  only  4.0.  Variance  on  the  log  scale  does  not  tell  the  whole  story 
because  we  are  averaging  on  the  anti-log  scale;  it  is  outliers  (having  very  low  likelihood)  that 
are  particularly  influential,  but  still  variance  gives  some  indication. 

Suspecting  that  some  additional  improvements  could  be  made,  we  combined  the  stabiliza¬ 
tion  technique  with  the  method  discussed  at  the  end  of  §2,  using  a  Gaussian  approximation 
to  the  posterior  7r(a,  b\y)  as  the  density  /.  The  estimate  becomes  a  harmonic  mean  of  the 
values  7r(y\a,  b)7i(a,  b)/ f(a,  b ),  with  (a,  6)’s  sampled  from  their  posterior.  Figure  5c  shows  the 
time  series  of  adjusted  marginal  likelihoods.  The  main  advantage  of  this  adjustment  is  that 
now  the  influence  of  individual  sample  points  is  greatly  diminished.  The  estimated  log  inte¬ 
grated  likelihood  is  -952.6.  A  brute  force  grid-based  numerical  integration  of  7t(y\a,  b)7r(a,  b ) 
gave  -951.4.  Thus  we  see  that  the  initial  stabilization  method  worked  fairly  well  and  was 
easily  improved. 
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Figure  5:  The  top  panel,  middle  panel  and  bottom  panel  compare  the  standard  harmonic 
mean  estimator,  modified  estimator,  and  adjusted  estimator  based  on  Gelfand  and  Dey 
(1994),  respectively,  for  the  NBA  example. 
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5  Other  Reductions:  A  Simple  Poisson- Gamma  Model 


Sometimes  useful  reductions  are  hard  to  find,  and  the  natural  approach  we  have  considered 
of  integrating  out  a  parameter  does  not  work.  A  simple  example  is  when  y  has  a  Poisson 
distribution  with  mean  7A,  and  7  is  exponentially  distributed  with  mean  1  and  independent 
of  A  a  priori.  The  standard  harmonic  mean  estimator  of  7r (y)  uses  samples  9l  =  (A*,  7*) 
from  n(9\y),  and  averages  the  reciprocals  of  Poisson  probabilities.  Stability  depends  on  the 
second  noncentral  moment 

E  {  K( y\°)]~2\ y]  oc  J  J  exp{— 7(1  -  A)}tt(A)  d”jd\. 

Note  that  the  inner  integral  diverges  for  any  A  >  1,  so  that  the  standard  harmonic  mean  is 
unstable.  The  natural  reduction  would  be  to  take  h(9)  =  A.  Thus  the  marginal  likelihood 
n[y\h(9)]  =  7r(yjA)  is  a  geometric  distribution  Xy /(I  +  A)^+1^.  Stability  here  hinges  upon 

E  {  [^(^l^)]-2!  2/}  oc  J  (l  +  A)7r(A)dA. 

For  small  A,  the  dominant  term  of  the  integrand  is  7i(\)/\y,  and  so  stability  of  the  modified 
harmonic  mean  depends  on  the  prior,  though  for  a  standard  Gamma  prior,  for  example, 
this  integral  can  diverge.  In  other  words,  both  variances  in  Theorem  1  equal  infinity.  Thus 
integrating  out  7  does  not  produce  a  stabilized  harmonic  mean  estimator  in  this  case. 

Another,  further  reduction  does  work,  however.  Consider  the  case  where  A,  like  7,  has  a 
prior  exponential  distribution  with  mean  1.  Suppose  that  h{9)  =  0  if  A  <  e,  and  h(9 )  =  A 
if  A  >  e,  where  e  is  a  small  predetermined  constant.  Then  7i[y\h(9)  =  0]  ~  ey+1/(y  + 
1)  (better  approximations  are  readily  available  if  necessary),  and  it  is  easily  shown  that 
E {7r[y\h(9)]~2\y}  <  00.  Thus,  with  this  refinement,  the  modified  harmonic  mean  estimator 
is  stable. 

6  Discussion 

In  this  article  we  have  described  a  way  to  stabilize  the  harmonic  mean  estimator  of  the 
integrated  likelihood  (Newton  and  Raftery  1994)  by  taking  advantage  of  dimension-reducing 
transformations  on  the  parameter  space.  The  proposed  variance  stabilizing  method  extends 
a  very  simple  tool  into  a  range  of  widely  used  hierarchical  statistical  models.  As  illustrated  in 
§3  and  §4,  dimension  reduction  is  straight  forward  in  certain  hierarchical  models.  Sometimes 
the  natural  approach  of  integrating  out  a  nuisance  parameter  does  not  yield  a  stabilized 
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estimator,  however  ,  and  one  must  search  farther.  We  have  given  one  example,  a  simple 
Poisson-Gamma  model,  in  §5  where  the  natural  approach  does  not  work  directly,  but  a 
slight  refinement  of  the  h(-)  function  does  yield  a  stabilized  estimator.  The  trick  used  there 
to  find  this  refined  h  function  was  based  on  the  fact  that  the  estimator  is  stable  if  and  only 
if  E {7r[y\h(6)]-'2\y}  <  oo.  We  wrote  this  expectation  as  an  integral,  identified  the  part  of 
the  range  of  integration  responsible  for  the  integral  being  infinite,  and  effectively  carried  out 
the  integration  over  that  small  part  of  the  space  via  analytic  approximation,  thus  defining 
a  new  h  function.  Dimension  reduction  for  variance  stabilization  may  not  be  an  effective 
method  to  compute  normalizing  constants  in  certain  very  hard  problems.  In  the  cases  we 
have  studied,  we  have  shown  that  it  is  possible  to  stabilize  the  harmonic  mean  estimator 
and  obtain  estimates  that  are  much  more  accurate,  but  still  easy  to  calculate. 

Gelfand  and  Dey  (1994)  derived  an  alternative  approach  to  estimating  the  integrated 
likelihood  tt (y),  as  discussed  towards  the  end  of  §2.  However,  their  estimator  can  be  sensitive 
to  the  choice  of  the  function  /(•),  and  can  also  suffer  from  instability  for  the  same  reasons 
as  the  standard  harmonic  mean  estimator  does.  In  §4  we  showed  that  their  approach  can 
be  combined  with  the  stabilization  method  proposed  here  for  improved  performance  of  the 
harmonic  mean  estimator. 

Another  application  of  the  proposed  stabilization  approach  includes  robust  linear  models 
(Andrews  and  Mallows  1974,  Carlin  and  Louis  1996).  The  robust  linear  model  has  an  error 
term  distributed  as  Zf\/\ 7,  where  Z  and  U  are  independently  distributed  as  Normal(0, 
precision  =  ip)  and  x2  with  u  degrees  of  freedom,  respectively.  The  standard  harmonic  mean 
estimator  can  have  infinite  variance.  A  stabilized  harmonic  mean  estimator  can  be  obtained 
by  integrating  out  the  denominator  U  (details  not  shown  but  available  from  the  authors). 

Hierarchical  models  which  involve  standard  distributions  may  be  good  candidates  for 
the  present  approach.  For  one  thing,  MCMC  is  well  understood  for  within-model  posterior 
simulation.  Furthermore,  the  integrations  required  for  dimension  reduction  may  be  solved 
analytically.  The  simplicity  of  the  resulting  stabilized  harmonic  mean  is  its  main  advantage. 

The  reversible-jump  MCMC(Green  1995)  is  a  specialized  algorithm  for  Bayesian  model 
selection.  Satagopan  and  Yandell  (1996)  used  this  method  to  estimate  the  number  of  QTLs 
(s)  in  the  statistical  genetics  problem  of  §3  by  approximating  the  posterior  model  proba¬ 
bilities  using  the  reversible-jump  MCMC  algorithm.  The  reversible  jump  MCMC  algorithm 
involves  additional  calculations  beyond  single-model  MCMC,  and  the  resulting  algebra  and 
computation  can  be  burdensome.  Further,  careful  implementation  of  the  algorithm  includes 
an  appropriate  choice  of  prior  distribution  for  s,  as  this  can  influence  rapid  mixing  of  the 
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chains  (Satagopan  and  Yandell  1996).  When  the  number  of  models  considered  is  not  too 
large,  as  for  example  in  our  QTL  example,  the  methods  described  here  enable  one  to  infer 
the  Bayes  factors  directly  from  single-model  MCMC  runs,  which  can  be  easier  to  implement 
than  reversible  jump  MCMC. 


Appendix  I:  Student’s  t 

Student  t 


Copying  Bernardo  and  Smith  (1994,  page  122), 


St(xjjU,  A,  a)  =  c 


1  +  —(x  -  g)2 
a 


-|  —  (q+1)/2 


where 


r((q  +  i)/2)  (y 
r(a/2)  r(l/2)  {a, 


1/2 


Multivariate  Student  t 

Using  the  notation  of  Bernardo  and  Smith  (1994,  page  139), 


Stn(x\g,  A ,a)  =  c 


1  H — ( x  —  g)T  X(x  —  g) 
a 


-(a+n)/ 2 


where 


e=  r((a  +  n)/2)  /2 

r(a/2)  (avr)n/2  v  ' 


x  and  n  are  of  dimension  n.  A  is  a  symmetric,  positive-definite  n  x  n  matrix,  and  a  >  0. 


Appendix  II:  Proof  of  equation  4 


Define 


Set 


f(»)  =  -ho)2  and  g(n)  =  ^(y  -  /i)2  . 


a(n)  —  1  + 


i  +  Hi*)  ' 
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It  can  be  easily  shown  that  the  maxima  of  the  continuous  function  a(/i)  occurs  at  fi*  — 
Ho  —  a/[n0(y  —  Ho)],  and  the  maximum  value  of  the  function  is 

°(^*)  —  1  H - b  ■ 

no 

Further  a(p)  — >  1  +  l/n0,  as  p  — *  ±oo.  The  expected  value  of  interest  can  be  written  as 

E{[^(y\^v}  a  / ^a^a/2+1^  +  f^)ya/2dy : 

where  [1  +  /(/i)]_“^2  is  proportional  to  a  t-density  of  the  form 

St(/x|/xo,  nQ{a  —  1) / ck,  a  —  1)  . 

Since  1  <  a(fx)  <  a(/x*),  the  integral  on  the  right  hand  side  is  finite  by  dominated  convergence 
theorem  when  a  >  1  and  n0  >  0. 


Appendix  III:  Proof  of  Theorem  1 


Define  a  =  h(0),  write  9  =  (a,  (3),  and  set 

1 


a  —  E 


[Tr(y\a)f 


y 


and 


b  =  E 


[n(y\9)f 


y 


Since  both  l/'R{y\a)  and  1  / 7r(j/| 6)  have  common  expectation  1/ir (y),  it  suffices  to  show  that 
a  <  b.  Expanding  6,  we  have 


where 


By  contrast, 


b  = 


-a 

a 


i 


Xs/ \a,P)]2 

1 


7T 


'^(y\^P)]2 

j  b(a)  Tr(a\y)  da 


(a,  /3\y )  d/3  da 
(P\ a,  y)  p(a\y )  d/3  da 


b{a)  =  J 


n(y\a,P)]: 


■  tt(/3| a,y)  dp. 


a  = 


J  a (a)ir (a\y)  da 
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where 


a(a )  = 


1 


[n{y\a)]2' 

Therefore,  it  is  sufficient  to  prove  that  a(a)  <  b(a )  for  all  a.  Simplifying  b(a),  we  have 

1 


b(a )  =  J 

-  s 


r  ,  |  R^(P\<x,v)dP 

[ir{y\a,P)]2 

1  Tr(y\a,P)n(f3\a)ir(a) 


/5)]2  7r(j/|a)7r(a) 

n(P\ a) 


d/5 


—  / 

wla)  J 


dp. 


n(y\a)  J  n(y\a,/3) 
Cancelling  one  factor  l/7r(j/|a),  we  have  a(a)  <  6(a)  if 


1 


vr(//ja)  ^  J  7r(j/|a,/3) 


< 


/ 


d/5. 


This  follows  by  Jensen’s  inequality  using  the  distribution  ir(P\a).  In  the  event  that  one  or 
another  of  the  integrals  diverges,  a{a)  <  6(a)  must  continue  to  hold. 


References 

Andrews,  D.  F.  and  C.  L.  Mallows  (1974).  Scale  mixtures  of  normality.  Journal  of  the 
Royal  Statistical  Society,  Series  B  36,  99-102. 

Carlin,  B.  P.  and  T.  A.  Louis  (1996).  Bayes  and  Empirical  Bayes  Methods  for  Data  Analysis, 
Chapter  6,  pp.  209-211.  Chapman  and  Hall. 

Chib,  S.  (1995).  Marginal  likelihood  from  the  Gibbs  output.  Journal  of  the  American 
Statistical  Association  90,  1313-1321. 

DiCiccio,  T.  J.,  R.  E.  Kass,  A.  E.  Raftery,  and  L.  Wasserman  (1997).  Computing  Bayes 
factors  by  combining  simulation  and  asymptotic  approximation.  Journal  of  the  American 
Statistical  Association  92,  903-915. 

Doerge,  R.  W.,  Z.-B.  Zeng,  and  B.  S.  Weir  (1997).  Statistical  issues  in  the  search  for  genes 
affecting  quantitative  traits  in  experimental  populations.  Statistical  Science  12,  195-219. 

Gelfand,  A.  E.  and  D.  K.  Dey  (1994).  Bayesian  model  choice:  asynptotics  and  exact 
calculations.  Journal  of  the  Royal  Statistical  Society,  Series  B  56,  501-514. 


18 


Gelman,  A.,  J.  B.  Carlin,  H.  S.  Stern,  and  D.  B.  Rubin  (1996).  Bayesian  Data  Analysis. 
Chapman  and  Hall. 

Green,  P.  J.  (1995).  Reversible  Markov  chain  Monte  Carlo  computation  and  Bayesian  model 
determination.  Biometrika  82,  711-732. 

Kass,  R.  E.  and  A.  E.  Raftery  (1995).  Bayes  factors.  Journal  of  the  American  Statistical 
Association  90,  773-795. 

Lewis,  S.  M.  and  A.  E.  Raftery  (1997).  Estimating  Bayes  factors  via  posterior  simulation 
with  the  Laplace- Metropolis  estimator.  Journal  of  the  American  Statistical  Association  92, 
648-655. 

Newton,  M.  A.  and  A.  E.  Raftery  (1994).  Approximate  Bayesian  inference  with  the  weighted 
likelihood  bootstrap.  Journal  of  the  Royal  Statistical  Society  B  56,  3-48. 

Raftery,  A.  E.  (1996).  Hypothesis  testing  and  model  selection.  In  W.  R.  Gilks,  S.  Richard¬ 
son,  and  D.  J.  Spiegelhalter  (Eds.),  Markov  chain  Monte  Carlo  in  practice,  Chapter  10,  pp. 
163-188.  Chapman  and  Hall. 

Satagopan,  J.  M.  and  B.  S.  Yandell  (1996).  Estimating  the  number  of  quantitative  trait 
loci  via  bayesian  model  determination.  In  Proceedings  of  the  Section  on  Biometrics.  Joint 
Statistical  Meetings,  Chicago. 

Satagopan,  J.  M.,  B.  S.  Yandell,  M.  A.  Newton,  and  T.  C.  Osborn  (1996).  A  Bayesian 
approach  to  detect  quantitative  trait  loci  using  Markov  chain  Monte  Carlo.  Genetics  Iff, 
805-816. 


19 


